Applying recursion to serial and parallel QR factorization leads to better performance

نویسندگان

Erik Elmroth

Fred G. Gustavson

چکیده

We present new recursive serial and parallel algorithms for QR factorization of an m by n matrix. They improve performance. The recursion leads to an automatic variable blocking, and it also replaces a Level 2 part in a standard block algorithm with Level 3 operations. However, there are significant additional costs for creating and performing the updates, which prohibit the efficient use of the recursion for large n. We present a quantitative analysis of these extra costs. This analysis leads us to introduce a hybrid recursive algorithm that outperforms the LAPACK algorithm DGEQRF by about 20% for large square matrices and up to almost a factor of 3 for tall thin matrices. Uniprocessor performance results are presented for two IBM RS/6000 SP nodes—a 120-MHz IBM POWER2 node and one processor of a four-way 332-MHz IBM PowerPC 604e SMP node. The hybrid recursive algorithm reaches more than 90% of the theoretical peak performance of the POWER2 node. Compared to standard block algorithms, the recursive approach also shows a significant advantage in the automatic tuning obtained from its automatic variable blocking. A successful parallel implementation on a four-way 332-MHz IBM PPC604e SMP node based on dynamic load balancing is presented. For two, three, and four processors it shows speedups of up to 1.97, 2.99, and 3.97.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Performance Library Software for QR Factorization

In 5, 6], we presented algorithm RGEQR3, a purely recur-sive formulation of the QR factorization. Using recursion leads us to a natural way to choose the k-way aggregating Householder transform of Schreiber and Van Loan 10]. RGEQR3 is a performance critical sub-routine for the main (hybrid recursive) routine RGEQRF for QR fac-torization of a general m n matrix. This contribution presents a new ...

متن کامل

A high-performance algorithm for the linear least squares problem on SMP systems

We present new recursive serial and parallel algorithms for the linear least squares problem AX = B, where A is m by n, m n. The algorithms improve performance. This work is an extension of our work on QR factorization 4]. The key idea is to combine the computation of Q T B with the QR factorization, thereby saving computations compared to the standard LAPACK algorithm. Recursion allows us to r...

متن کامل

New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems

We present a new recursive algorithm for the QR factoriza-tion of an m by n matrix A. The recursion leads to an automatic variable blocking that allow us to replace a level 2 part in a standard block algorithm by level 3 operations. However, there are some additional costs for performing the updates which prohibits the eecient use of the recursion for large n. This obstacle is overcome by using...

متن کامل

Parallel Algorithms for Toeplitz Systems

We describe some parallel algorithms for the solution of Toeplitz linear systems and Toeplitz least squares problems. First we consider the parallel implementation of the Bareiss algorithm (which is based on the classical Schur algorithm). The alternative Levinson algorithm is less suited to parallel implementation because it involves inner products. The Bareiss algorithm computes the LU factor...

متن کامل

Enhancing Parallelism of Tile QR Factorization for Multicore Architectures

To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist of scheduling a Directed Acyclic Graph (DAG) of fine granularity tasks where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on mod...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IBM Journal of Research and Development

دوره 44 شماره

صفحات -

تاریخ انتشار 2000

Applying recursion to serial and parallel QR factorization leads to better performance

نویسندگان

چکیده

منابع مشابه

High-Performance Library Software for QR Factorization

A high-performance algorithm for the linear least squares problem on SMP systems

New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems

Parallel Algorithms for Toeplitz Systems

Enhancing Parallelism of Tile QR Factorization for Multicore Architectures

عنوان ژورنال:

اشتراک گذاری